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^explanation"  (Tversky  and  Kahnenan,  1??3>  fGr  . subjects''  overconfidence  in 
estinating  the:  probability  of  specified  hypotheses.  The  conjecture  is  that 
subjects  have  difficulty  retrieving  unspecified  hypotheses;  a complete  set  of 
candidate  unspecified  hypotheses  is  unavailable  during  assesscient.  Therefore, 
the  underpopulated  set  of  unspecified  hypotheses  is  regarded  as  less  probable 
and  the  specified  set  is  regarded  as  More  probable.  A control  group  in  .this 
study  replicated  previous  findings  of  overconfidence  for  specified  hypotheses. 
Two  Manipulations  to  increase  the  availability  of  unspecified  hypotheses  uere 
investigated.  One  Manipulation  involved  explicitly  requesting  subjects  to 
populate  the  unspecified  set.  The  other  Manipulation  consisted  of  computer 
presentation  of  candidate  unspecified  hypotheses.  Although  in  a nornative 

sense,  neither  Manipulation  should  have  affected  judgements,  results  indicated 
that  assessment  overconfidence  for  both  experimental  groups  uas  reduced.  These 
results  support  our  conjecture  that  the  availability  heuristic  is  at  least 
partially  responsible  for  subjects'  excessive  behavior  in  evaluating  specified 
hypotheses.-^. 


Unclassified 

lieuniTy  cwamwcation  or  this  wwmm  b#*« 


The  availability  explanation  of  excessive  plausibility  assesswents 


A necessary  precursor  to  any  decision  analysis  is  an  identification  of  possible 
hypotheses  to  be  considered,  a process  ue  tern  "hypothesis  generation."  This 
process  involves  a partition  of  all  possible  hypotheses  appropriate  for  the 
problen  into  two  sets,  the  set  of  "specified"  (generated)  hypotheses  and  the 
cor.plr  ent  of  this  set,  the  set  of  "unspecified"  hypotheses.  The  result  of  a 
previous  study,  (Gettys,  fisher  and  Mehle,  1978)  was  that  subjects  were 
overconfident  in  assessing  sets  of  specified  hypotheses  and  underconfident  in 
assessing  sets  of  unspecified  hypotheses.  In  this  previous  study,  and  in  the 
current  study,  subjects  estinated  their  feelings  of  certainty  by  judging  the 
odds  of  three  specified  possible  Majors  of  an  unknown  undergraduate  student  at 
the  University  of  OKI aborts  and  a fourth  "catch-all"  possibility  corresponding 
to  the  alternative  that  the  unknown  student  had  sotie  other  Major.  The  data  for 
these  problems  were  classes  that  the  unknown  student  had  taken.  The  veridical 
values  were  obtained  by  analyzing  the  cortputerized  student  Master  record  file 
at  the  University  of  OklahoMa.  A Magnitude  estination  procedure  was  used  to 
obtain  the  subjects'  responses. 

In  other  contexts,  the  overconf idence  bias  has  received  considerable  attention 
recently:  Lichtenstein,  Fischhoff  and  Phillips  (1977)  review  several  studies 
which  address  this  issue.  Kahnenan  and  Tversky  (in  press)  listed  lack  of 


t 


expertise,  insensitivity  to  the  quality  of  data,  oversensitivity  to  data 
consistency,  conditionality  (adopting  unstated  assumptions)  and  anchoring  as 

contributors  to  the  cverconfidence  bias. 


I 


The  purpose  of  the  present  study  was  to  investigate  a factor  which  nay 
contribute  to  overconfidence  in  hypothesis-generation  tasks,  the  "availability 
heuristic."  Ue  postulate  that  subjects  may  have  underestimated  the  likelihood 
of  the  catch-all  alternative  in  the  Gettys  et  al.  (1978)  study  simply  because 
they  had  difficulty  populating  the  catch-all  alternative  with  hypotheses. 
Since  some  catch-all  hypotheses  would  not  be  available  and  thus  not  evaluated 
when  (taking  certainty  estimates,  subjects  would  tend  to  underestimate  the 
likelihood  of  catch-all  sets. 

This  "availability"  explanation  of  subjects''  excessive  odds  estimates  of 
specified  hypotheses  is  related  to,  but  not  identical  to,  the  availability 
heuristic  described  by  Tversky  and  Kahneman  (1973,  1974).'  Tversky  and  Kahneman 
reported  evidence  that  subjects  were  influenced  by  the  availability  in  memory 
of  instances  of  an  event  uher.  evaluating  the  probability  of  that  event.  In  the 
context  of  hypothesis  generation,  subjects  must  be  able  to  judge  the  likelihood 
of  the  set  of  all  unspecified  hypotheses  in  order  to  accurately  assess  the 
likelihood  of  the  complement  of  this  set,  the  set  of  specified  hypotheses.  If 
subjects  simply  cannot  recall  many  of  the  hypotheses  of  the  unspecified  set,  it 
stands  to  reason  that  their  likelihood  estimates  for  the  unspecified  set  should 
be  conservative. 
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The  current  study  includes  a partial  replication  of  the  Gettys  et  al.  (1978) 
study  and  two  additional  manipulations  to  test  the  availability  explanation. 


JIT1**  1 ~ - -1**t*£‘r j J* 


The  two  manipulations  were  designed  to  increase  the  avail ability  of  catch-all 
alternatives.  Our  prediction  was  that  increasing  the  availability  of  catch-all 
possibilities  would  increase  catch-all  assessments,  reducing  subjects’'  over- 
confidence  in  the  specified  sets.  A Control  group  was  presented  problems  in  a 
computerized  format,  one  datum  per  problem.  The  subjects'  basic  task  was  to 
estimate  the  odds  for  three  specified  hypotheses  and  the  catch-all  alternative. 
Subjects  in  one  experimental  condition,  the  "Exemplar"  group,  were  presented 
the  Control  subjects'  display  plus  five  exemplar  hypotheses.  Subjects  in  the 
other  experimental  condition,  the  "Retrieval"  group,  were  asked  to  generate 
candidate  hypotheses  for  the  catch-all  before  making  the  same  type  of  odds 
estimates  as  subjects  in  the  other  groups. 


We  examined  the  two  experimental  manipulations  partially  for  their  applied 
implications.  Although  either  could  be  implemented  in  an  applied  setting,  the 
Retrieval  group  procedure  of  encouraging  subjects  to  populate  catch-all  sets 
with  possible  hypotheses  would  be  preferred  over  the  Exemplar  procedure  if  they 
were  equally  effective.  The  Retrieval  manipulation  is  essentially  only  ' 
change  in  instructions  or  training.  The  Exemplar  procedure  requires  equipment 
to  display  the  exemplar  catch-all  hypotheses  during  the  hypothesis-generation 
task  and  the  generation  of  extensive  exemplar  lists  prior  to  the  task. 
Particularly  in  nonrecurring  situations,  obtaining  high-quality  exemplars  may 
be  difficult  or  impossible. 


Method 


Sub jec  ts  I 

_ --  ...  s 

A total  of  48  subjects  participated  in  this  study.  All  were  undergraduate  stu- 

'6 

dents  at  the  University  of  Gklahoma  enrolled  in  the  Introductory  Psychology  | 

V 

& 

course.  Subjects  were  randomly  assigned  to  the  three  conditions,  16  subjects  i 

* 

per  condition.  Half  of  the  subjects  in  each  condition  were  female  and  half  f 

% 

were  male.  I 
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Apparatus 

The  experimental  sessions  were  under  the  control  of  an  intelligent  graphics 
terninal  having  color  graphics  capability.  The  computer  was  a Compucolor  8001, 
Manufactured  by  the  Intelligent  Systems  Corporation,  Norcross,  GA.  Control  and 
Exemplar  group  subjects  entered  odds  estimates  using  the  terminal's  lightpen. 
Retrieval  subjects  entered  possible  hypotheses  on  the  terminal's  keyboard 
before  entering  odds  estimates  with  the  light  pen.  The  odds  estimates  entered 
with  the  lightpens  were  assumed  to  be  proportional  to  the  probabilities  of  the 
hypotheses,  or  sets  of  hypotheses,  given  the  data  and  could  be  converted  to 
probability  measures  through  a simple  normalization. 

Problem  Generation 

A data  base  consisting  of  166,853  records  was  used  to  generate  30  problems  for 
this  study.  The  data  base  was  created  by  accessing  the  computer  master  record 
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file  for  nontransfer  undergraduate  students  at  the  University  of  Oklahoma.  The 
results  of  our  analyses  of  this  data  base  were  frequencies  which  nay  be 

considered  to  be  the  actual  population  parameters.  Classes  were  selected  to 

have  a reasonably  large  enrollment.  Problems  were  selected  so  that  the 

probability  of  the  set  of  three  specified  hypotheses  varied  from  fairly  small 

to  fairly  large  and  so  that  the  catch-all  set  of  unspecified  hypotheses  was 

fairly  rich. 


Example.  Problem 

Following  is  a description  of  three  subjects-'  responses  to  an  example  problem 
to  provide  a concrete  illustration  of  the  procedure.  The  subjects''  responses 
were  to  problem  24,  which  involved  the  datum:  "Aviation  1113,  Introduction  to 
Aviation,"  a three-credit  freshman-level  course.  This  da-*um  represents  a class 
taken  by  an  undergradute  student  having  an  unknown  major.  Subjects  were  asked 
to  evaluate  the  relative  1 i ' of  these  four  possibilities:  Social  Uork, 

Psychology,  Education  and  uil  others,"  the  catch-all  alternative.  The 
veridical  probability  were,  respectively,  0,  2.7,  6.6  and  90.7  percent. 

Subject  2,  in  the  Control  condition,  gave  magnitude  estimation  responses  which, 
when  converted  to  percent  probabilities,  were:  50.7  for  Social  Uork,  14.2  for 
Psychology,  18.8  for  Education  and  16.2  for  all  others. 


Subject  1 was  assigned  to  the  Exemplar  condition  and  for  this  problem  was  shown 
a list  of  the  following  majors  as  possibilities  in  the  catch-all  set: 
Business,  Journalism,  University  College  Unclassified,  Political  Science  and 
Nursing.  Together,  these  five  possibilities  accounted  for  56.4  percent  of 
students  who  had  enrolled  in  Aviation  1113,  This  subject's  responses, 


converted  to  percent  probabilities,  Here  31.3  percent  for  Social  Uork,  25.0 
percent  for  Psychology,  20.0  percent  for  Education  and  23.8  percent  for  all 
others. 
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Subject  3,  a Menber  of  the  Retrieval  group,  suggested  the  following  set  of 
Majors  as  containing  all  possibilities  having  a probability  greater  than  zero: 
Business,  Journalise,  Hone  Econonics,  Sociology  and  Chemistry.  The  veridical 
probabilty  of  this  collection  of  five  hypotheses  is  42.0  percent.  Subject  3's 
responses  converted  to  probability  percents  were:  39.4  percent  for  Social 

Uork,  9.8  percent  for  Psychology,  30.7  percent  for  Education  and  20.1  percent 
for  all  others. 

Procedure 

Each  session  began  with  instructions  presented  on  the  terninal's  CRT.  In  each 
task,  the  study  was  subject-paced.  The  Control  and  Exenplar  group  subjects 
generally  required  one  hour  to  complete  the  instructions  and  the  experinental 
session  while  the  Retrieval  group  subjects  required  two  hours.  During  the 
experimental  session,  each  subject  was  presented  30  probleMs  in  a rando«  order. 

Each  problem  contained  three  specified  hypotheses  concerning  the  possible 
Major  of  an  unknown  University  of  Oklahoma  undergraduate  student  and  a fourth 
"catch-all"  alternative  that  the  unknown  student  had  sone  other  Major.  Also 
provided  was  a course  that  the  unknown  student  had  taken,  described  by  the 
course  number,  department  and  title. 

The  instructions  were  designed  to  provide  graduated  training 
> 

in  the  experimental  task.  Subjects  were  first  introduced  to  the  operation  of 
the  light-pen,  then  were  trained  in  the  Magnitude  estiMation  procedure  using  a 


concrete  problem  involving  estimation  of  the  areas  of  rectangles  and  a more 
abstract  problem  involving  prediction  of  the  outcome  of  the  next  presidential 

election.  The  final  phase  of  the  instructions  involved  ten  problems  of  the 
sane  type  as  those  used  in  the  actual  experimental  session. 

&il??Ci(5§Dtal  Tasks.  The  display  for  the  experimental  task  of  the  Control 
group  subjects  consisted  of  2 boxed  area  at  the  top  of  the  CRT  containing  the 
course  number,  department  and  title  of  the  class  the  unknown  student  had  taken. 
Below  the  box  uere  four  horizontal  lines.  The  top-most  three  lines  were 
labeled  with  three  specified  majors.  The  fourth  line,  labeled  ''All  Others," 
corresponded  to  the  catch-all  alternative.  Subjects  made  magnitude  estimation 
responses  by  adjusting  the  length  of  a colored  segment  on  the  horizontal  lines 
with  a light  pen.  The  horizontal  lines  were  labeled  with  calibration  markings 
at  0,  25,  50,  ?5  and  100,  with  100  corresponding  to  the  full  length  of  the 

line.  Thus,  the  subjects'  modulus  for  the  magnitude  estimation  procedure  was 
100,  the  length  of  the  line  identified  with  the  most  likely  alternative.  The 
specified  majors  for  each  problem  were  the  sane  for  all  groups,  but  problem 
presentation  order  was  randomized  across  subjects.  Also,  the  order  of  the 
three  specified  hypotheses  on  the  display  was  randomized  for  each  problem  in 
all  conditions. 

Exemplar  group  subjects  saw  virtually  the  same  instructions  and  problems  as  the 
Control  group  subjects,  except  that  the  computer  inserted  the  word  "Including:" 
and  a list  of  five  candidate  alternatives  below  the  label  "All  Others"  on  the 

1 

bottom  line  of  the  CRT  display. 

Unlike  the  Control  and  Exemplar  group  subjects,  the  Retrieval  group  subjects 
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where  shown  two  displays  for  each  probleM,  rather  than  just  one.  Otherwise  the 
instructions  and  experinental  prohlens  uere  identical  to  those  for  the  other 
two  groups.  The  first  page  display  contained  the  data  set  off  in  a box  at  the 
top  of  the  screen.  Subjects  were  instructed  to  enter  possible  exemplars  for 
the  catch-all  alternative  until  they  believed  their  list  covered  virtually  all 
possibilities  in  the  catch-all  having  probabilities  greater  than  zero.  On  the 
basis  of  a pilot  study,  the  software  was  written  to  not  accept  More  than  five 
catch-all  possibilities.  (Subjects  in  the  Main  study  seldon  entered  even  five 
possibilities.  The  wean  nunber  of  possibilities  entered  by  subjects  in  this 
condition  was  only  1.87.)  For  this  subtask,  the  cowputer  assisted  subjects 
with  spelling  to  insure  that  the  najors  would  be  correctly  spelled  for  further 
processing.  The  second  page  displays  for  the  Retrieval  group  were  identical  to 
the  displays  seen  by  Exeeplar  group  subjects,  except  that  page  one  responses 
were  listed  as  candidate  catch-all  Majors,  replacing  the  conputer-generated 
list  supplied  to  Exenplar  group  subjects. 


Results  and  Discussion 


The  probabilities  subjects  assigned  to  the  catch-all  alternatives  were 
calculated  from  their  Magnitude  estimates  and  were  used  as  scores  for  an 
initial  ANOVA.  For  this  analysis,  subjects''  Magnitude  estimation  responses  for 
the  three  specified  hypotheses  and  the  catch-all  alternative  uere  normalized  to 
probabilities  and  the  probabilities  assigned  to  the  catch-all  alternatives  were 
ured  as  scores.  A conservative  catch-all  response  corresponds  to  excessive 
assessments  of  the  collection  of  specified  hypotheses  and  vice  versa.  The 
factors  for  this  analysis  were  the  30  problems,  subjects,  the  three  groups 
(Control,  Exemplar  and  Retrieval)  and  a female/male  blocking  factor. 
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Overall,  the  pattern  of  excessive  estimates  for  the  specified  hypotheses  and 
conservative  estimates  for  the  catch-all  hypotheses  observed  in  the  previous 
study  (Gettys  et  al.,  1973)  was  replicated  .here;  also,  both  experimental 
manipulations  reduced  conservatism  in  a mean  sense.  The  group  means  were: 
Control,  17.6  pe>-  enl;  Exemplar,  27.1  percent  and  Retrieval,  23. A percent, 

compared  to  a veridical  mean  catch-all  probability  of  48.9  percent.  The  group 
main  effect  was  significant,  R 2,  42)  = 7.59,  p \ .01.  The  male/female 

blocking  factor  was  not  significant.  The  main  effect  due  to  problems  was 

significant,  F(  29,  1218)  « 23.0,  p < .001.  The  major  difference  between 

problems  was  the  veridical  probability  of  the  catch-all  alternative,  and  scores 
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for  this  analysis  were  subjects'  estimates  of  this  probability.  Subsequent 


analyses  examine  the  significant  problems  effect  and  its  interaction  with  the 


experinental  Manipulation,  the  groups  effect,  in  wore  detail. 


No  interactions  anong  the  factors  of  this  analysis  achieved  statistical 


significance,  except  the  problen  by  group  interaction,  F(  58,  1218)  = 2 .77,  p < 


.001.  This  interaction  suggests  that  the  experinental  Manipulation  did  not 


have  a sinple  additive  effect  on  responses.  An  approach  to  investigating  this 


significant  interaction  was  to  introduce  an  additional  factor  into  the  ANOVA. 


The  "diagnosticity"  factor  was  created  by  sorting  problens  into  three  groups  on 


the  basis  of  the  veridical  probabilities  of  the  catch-all  sets.  These  three 


categories  were  "low",  "nediun"  and  "high"  diagnosticity,  according  to  whether 


the  veridical  group  probability  of  the  catch-all  sets  uas  low,  Mediun  or 


Table  1 shows  the  deans  obtained  for  subjects  in  each  of  the  three  conditions, 


Control,  Exenplar  and  Retrieval,  over  the  three  diagnostic  catagories  of 


problens.  The  Mean  probability  of  the  catch-all  alternatives  are  contrasted 


with  the  veridical  values. 


(Insert  Table  1 about  here) 


In  general,  subjects  increased  the  Magnitude  of  their  responses  as  diagnos- 


ticity increased.  The  Means  for  the  diagnosticity  categories  were:  low,  18.8 


percent;  nediun,  24.1  percent  and  high,  25.3  percent.  The  diagnosticity  Main 
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Table  5 

Hean  Catch-All  Probabilities 
Expressed  as  Percents 


Group 


Dia^nosticity 

Control 

Exenplar 

Retrieval 

Veridical 

Lou 

16.6 

21.1 

18.7 

24.9 

liediun 

18.2 

29.1 

24.9 

49.5 

18.0  3 i 


6 


2 


SB?* 


effect  represented  by  these  Beans  was  significant,  F(  2,  84)  = 49.73,  p < .001. 
Since  the  problems  by  group  interaction  was  significant  in  the  previous 
analysis,  it  should  not  be  surprising  that  the  diagnosticity  by  group 
interaction  was  significant  in  this  analysis,  F(  4,  84)  - 7.58,  p < .00t.  The 
interaction  of  the  blocking  variable  (male/female)  with  diagnosticity  and  the 
three-way  interaction  were  not  significant. 

A more  fine-grained  analysis  of  the  differential  inpact  of  the  varied  veridical 
probabilities  of  the  catch-all  sets  on  the  three  groups  was  undertaken  using 
two  approaches  which  yielded  converging  results.  One  approach  was  in  the 
Bayesian  tradition  for  exanining  the  quality  of  probabilistic  responses. 
Individual  responses  of  each  subject  for  each  problem  were  transformed  to  log 
(base  10)  odds,  with  the  (posterior)  odds  being  expressed  as  the  ratio  of  the 
estinate  for  the  set  of  three  specified  hypotheses  divided  by  the  likelihood 
estinate  for  the  catch-all  set  of  unspecified  hypotheses."  These  transformed 
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(Insert  Table  2 about  here) 


scores  were  compared  to  the  veridical  log  odds  in  a correlational  analysis  for 
each  group.  Results  are  listed  in  Table  2.  For  + .iese  calculations,  responses 
resulting  in  undefined  (infinite)  log  od^s  were  deleted.  Tabled  also  are  the 
nunber  of  responses  deleted  for  this  reason  in  each  group. 


This  analysis  sheds  sorie  light  on  the  nature  of  the  significant  problems  by 
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Table  2 

Correlational  Analysis  of  Log  Odds  Scores 


Control 


Exenplar 


Retrieval 


Correlation 

Regression 

2 

r r 

Slope  Intercept 

Nunber 

Deleted 

Nunber 
of  Fairs 

exaiion: 

Intercept. 


Li 
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i 

I 

i 

| group  and  diagnosticity  by  group  interactions  noted  earlier.  The  variability 

I 

j in  the  Control  group  responses  is  nearly  unrelated  to  the  variability  in  the 

veridical  values.  The  slope  of  the  regression  line  for  the  control  group  is 
nearly  flat,  .100.  Both  experimental  manipulations  reduced  the  conservatism 
bias  in  responses,  but  not  as  an  additive  constant;  subjects  in  both 
experimental  groups  were  more  inclined  to  vary  their  estimates  someuhat  in 
accord  with  variations  in  the  population  parameters.  In  comparison  to  the 
Control  group,  the  square  of  the  Pearson  r was  over  five  times  as  large  for 
both  the  Exemplar  and  Retrieval  groups,  with  the  Exemplar  group  showing 
somewhat  of  an  advantage.  There  was  an  increase  in  the  slope  of  the  regression 
lines  for  both  experimental  groups  also.  By  way  of  reference,  the  regression 
• line  slope  would  be  1.0  if  subjects  were  perfectly  calibrated.  The  regression 

I 

j lines  for  the  three  groups  are  plotted  on  the  same  graph  for  comparison  in 

\ • 

Figure  1. 

As  might  be  expected  from  the  low  correlations  obtained,  the  scatter  plots  for 
these  regression  lines  are  fairly  uninformative.  Another  approach  to 
illustrating  the  differences  between  groups  was  to  consolidate  the  scattered 
problem  means  into  diagnosticity  means,  making  use  of  the  additional  factor 
introduced  for  the  second  ANOVft.  Figure  2 is  a graph  of  these  means. 


(Insert  Figure  1 and  2 about  here.) 


To  examine  the  diagnosticity  factor  means  in  terms  of  log  odds,  the  problem 
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Figure  1.  Regression  lines  for  the  Control,  Exemplar  and  Retrieval  groups 
contrasted  uith  the  veridical  line.  The  scores- used  in  the  regression  analysis 
were  calculated  as  the  log  (base  10)  of  the  ratio  of  assessnents  for  specified 
hypotheses  divided  by  assessnents  for  catch-all  sets.  The  synt-ols  do  not 
represent  significant  points  on  the  lines;  they  were  plotted  only  to 
distinguish  among  the  regression  lines.  The  solid  line  represents  the 


performance  of  an  optical  subject  producing  veridical  responses, 
regression  line  summarizes  approximately  480  scores. 


Each 
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Figure  2.  Group  Means  for  each  level  of  the  diagnostic ity  factor,  expressed  as 
log  (base  10)  odds.  Each  level  of  the  diagnosticity  factor  included  ten 
distinct  problens. 


17 


Mean  catch-all  probability  for  each  group  was  transformed  to  log  odds.  These 
transformed  means  were  averaged  within  the  three  diagnosticity  categories  to 


obtain  the  points  plotted  in  Figure  2.  The  pattern  of  decreased  overconfidence 
for  both  experimental  groups  is  in  evidence  in  the  second  figure  also.  There 
is  a "fanning"  tendency  across  the  diagnosticity  factors,  with  the  Exemplar 

group's  superiority  to  the  other  groups  maintained  over  all  three  diagnosticity 

/ 

levels. 


An  alternative  approach  to  examining  the  trials  by  group  interaction  in  more 
detail  was  carried  out  by  calculating  Brier  scores  for  each  of  the  three  groups 
and  examining  a partition  of  these  scores  (Murphy,  1973).  The  Brier  score  is  a 
member  of  a class  of  measures  of  probabilistic  estimates  called  "proper  scoring 
, rules."  The  principle  application  of  proper  scoring  rules  in  psychology  has 

■been  as  feedback  mechanisms  in  the  training  of  probabilistic  assessors,  see 

# 

Pickhardt  and  Uallace  (1974)  and  Lichtenstein  and  Fischhoff  ( Note  1)  for 
examples.  Our  motivation  for  investigating  the  Hurphy  partition  was  to  examine 

| the  effect  of  the  experimental  manipulations  on  each  component.  The  names  of 

I 

| the  components  and  their  relations  to  the  Brier  score  are:  Brier  score  = 

i 

l Uncertainty  ♦ Resolution  - Reliability.  See  Murphy  (1973)  and  Lichtenstein  and 

Fischhoff  (1977)  for  discussions  of  the  interpretations  of  these  components. 
Results  of  the  calculations  are  shown  in  Table  3. 


(Insert  Table  3 about  here.) 


Title  3 

Proper  Scoring  Rules  Analysis  of  Subjects' 
Assessnents  of  Specified  versus  Unspecified  Sets 


Partition 

Uncertainty  Reliability  Resolution 

Brier 

1 

Confidence 

Control 

.222 

.<20 

- .66 

Exenplar 

.500 

.115 

.006 

.608 

- .47 

Retrieval 

.500 

. 145 

.008 

.637 

- .53 

The  Brier  score  and  reliability  conponent  each  have-  a range 
{ 0,  2 >,  with  snaller  scores  being  preferred.  Smaller  scores  are 
preferable  for  the  uncertainty  conponent  also;  this  conponent  has 
a range  of  < 0,  .5  >.  The  resolution  conponent  has  a range  of 
{ 0,  .5  >,  and  larger  scores  are  preferred.  The  Brier  score  is 
Uncertainty  + Reliability  - Resolution. 


The  confidence  store  is  not  a conponent  of  the  Brier  score.  The 
preferred  score  is  0 and  the  range  of  possible  values  is  {-1,  1 >, 
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This  analysis  was  done  in  terms  of  problen  Means  tor  >ach  grrup.  The  scores 
describe  how  well  the  problen  weans  for  each  group  charade,  i +‘  » population 
paraneters.  Murphy's  (1973)  approach  was  Modified  to  calculate  the  scores 
shown  in  Figure  3.  Murphy  used  vectors  having  all  zero  entries  except  for  a 
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"1"  representing  the  state  of  the  world  which  obtained,  lie  were  able  to  enploy 
vectors  having  entries  corresponding  to  the  population  paraneters.  -Jur  guess 
is  that  the  effect  of  this  Modification  is  to  reduce  the  variability  in 
conputed  scores.  However,  as  noted  by  Lichtenstein  and  Fischhoff  (1977),  the 
distribution  of  the  Brier  score  and  its  partitions  are  unknown  at  the  present 
tine.  Murphy  (1974)  discussed  a very  related  issue  in  the  context  of  another 
scoring  rule.  Our  analysis  w<ir.  in  terMS  of  two-state  vectors  (specified  set, 
unspecified  (catch-all)  set)  and  the  interval  size  was  set  to  ten  percent. 

The  uncertainty  conponent  was  the  maximum  of  .5  for  each  group.  The  difference 
between  the  theoretical  naxinun  of  .5  and  the  coMputed  scores  was  in  the  fifth 
decinal  place.  Since  this  component  is  a property  of  the  environment  (Murphy, 
1973)  and  each  group  was  presented  the  saMP  collection  of  30  probleMS,  the 
uncertainty  score  should  not  vary  across  groups.  The  Magnitude  of  the 
uncertainty  score  was  interpreted  to  indicate  that  we  had  achieved  a ModicuM  of 
success  in  our  attenpt  to  choose  problems  having  catch-all  probabilities  which 
varied  over  a large  mutter  of  values,  with  neither  large  nor  snail  values 
•favored. 

Conpared  to  the  Control  group,  the  reliability  conponent  decreased  (iMproved) 

't 

for  both  experimental  groups.  The  reliability  scores  are  clearly  the  conponent 
most  influencing  the  differences  among  groups  in  total  Brier  scores.  The 
reliability  conponent  is  related  to  calibration  as  discussed  in  the  context  of 
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the  regression  analysis.  The  Retrieval  group's  reliability  score  being  the 
best  of  the  three  is  in  agreement  with  the  regression  analysis. 


The  resolu’^n  scores  were  so  nearly  identical  that  differences  between  then 


«ay  be  attributed  to  chance.  However,  both  experinental  groups  had  larger 


(better)  resolution  scores  than  the  Control  group. 


Also  listed  in  Table  3 are  the  confidence  scores,  a Metric  suggested  by 


Lichtenstein  and  Fishhoff  (1977),  which  is  related  to  the  reliability 


component,  but  which  is  not  part  of  the  Brier  score.  All  three  groups 


exhibited  negative  confidence  scores,  indicating  excessiveness  in  specified  set 


estimates  (conservatism  in  unspecified  set  estimates).  The  ordering  among  the 


three  groups  is  the  sane  as  suggested  by  the  overall  group  mean  catch-all 


responses  of  Table  1. 


(Insert  Table  A about  here.) 


To  further  examine  the  nature  of  the  availability  heuristic  in  hypothesis 


generation,  an  analysis  of  the  hypotheses  suggested  by  Retrieval  groups  was 


carried  out.  Table  4 is  a summary  of  this  analysis.  Subjects  in  the  Retrieval 


condition  were  instructed  to  respond  with  every  possible  Major  in  the  catch-all 


alternative  having  a probability  greater  than  zero.  Table  4 documents  the 


difficulty  subjects  encountered  on  this  subtask.  Although  the  overall  mean 


catch-all  probability  was  actually  48.87  percent,  the  mean  veridical 
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probability  of  the  sals  of  catch-all  hypotheses  subjects  generated  was  only 
6.25  percent. 

One  explanation  for  the  very  low  probability  of'  catch-all  sets  generated  by 
subjects  nay  be  that,  while  the  average  mutter  of  hypotheses  actually  contained 
in  the  catch-all  sets  was  23.13,  subjects  were  limited  by  the  software  to 
entering  no  More  than  five  possibilities.  However,  subjects  were  usually 
satisfied  with  sets  of  possible  catch-all  hypotheses  numbering  far  less  than 
five.  The  average  nunber  of  hypotheses  in  subjects-'  catch  all  sets  was  only 
1.87.  Apparently  subjects  could  access  in  memory  only  appropriately  eight 
percent  of  the  catch-all  possibilities  in  this  admittedly  difficult  task.  Ue 
believe  that  this  result  is  compelling  evidence  that  most  catch-all  hypotheses 
were  not  available  to  subjects  in  this  task. 

To  exanine  the  effect  of  this  heuristic  from  another  perspective,  an  additional 
correlational  analysis  was  undertaken.  This  analysis  was  done  by  problem  for 
each  subject  in  the  Retrieval  group.  The  actual  probability  of  the  catch-all 
set  the  subject  generated  was  substituted  for  the  veridical  probability  of  the 
entire  catch-all  set.  iiith  this  exception,  the  calculations  were  carried  out 
in  the  sane  manner  as  those  summarized  in  Tablw  2.  The  Pearson  correlation 
coefficient  calculated  w^s  .289;  the  square  of  this  correlation  uas  .084.  The 
regression  line  had  a slope  of  .201  au-1  an  intercept  of  .331.  Eleven  data 
pairs  were  deleted  because  the  subject's  log  odds  were  undefined  and  and 
additional  8?  data  pairs,  wore  deleted  because  the  veridical  log  odds  were 
undefined  ti.e.  the  subject  entereid  r.c  catch-all  possibilities).  As  a result 
of  these-  deletions,  the  correlation  statistics  uere  descriptive  of  382  total 
stores.  This  simple  Manipulation  nearly  doubled  the  covrelatiov;  squared,  from 


Conclusions 


The  «ajor  conclusion  of  this  study  is  that  our  "availability  explanation" 
conjecture  was  supported  by  the  data.  Two  independent  Manipulations  designed 
to  increase  the  availability  of  hypotheses  in  the  catch-all  alternative  each 
served  to  decrease  subjects'  overconfidence  m specified  hypotheses,  resulting 
in  «ore  veridical  estiM&tes  overall.  This  change  in  subjects'  probabilistic 
estinates  uas  obtained  for  either  of  tuo  Manipulations  which  have  no  effect  on 
the  veridical  probabilities. 

It  is  clear  that  either  expermenter-supplied  exemplars  for  the  catch-all,  or 
subject-generated  exenplars  reduce  the  bias  of  plausibility  estinates.  If  the 
Exenplar  Manipulation  had  involved  populating  the  catch-all  alternative  uith 
wore  than  five  hypotheses,  this  bias  Might  have  been  reduced  still  further. 
The  study  did  not  address  the  extent  to  which  availability,  as  ue  have  defined 
it,  explains  the  totality  of  the  observed  nonoptiMal  perforMance.  It  nay  be 
that  other  factors  contribute  to  overconfidence  in  hypothesis  generation  tasks, 
for  exanple,  those  Mentioned  by  Tversky  and  Kahnenan  (in  press)  to  explain  this 
bias  in  other  contexts.  However,  increasing  the  availability  of  catch-all 
hypotheses  does  decrease  this  bias.  Either  experimental  procedure  could  be 
inplenented  in  practical  hypothesis  generation  to  increase  the  quality  of 
subsequent  decision  analyses. 
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